Machine Translation of Film Subtitles from English to Spanish Combining a Statistical System with Rule - based Grammar

نویسندگان

  • M. Volk
  • Jeanette Isele
چکیده

In this project we combined a statistical machine translation system for the translation of film subtitles from English to Spanish with rule-based grammar checking. At first we trained the best possible statistical machine translation system with the available training data. The largest part of the training corpus consists of freely available amateur subtitles. A smaller part are professionally translated subtitles provided by subtitling companies. In the next step we developed, applied and evaluated the grammar checker. We investigated if the combination of a statistical system with a rule-based grammar checker is reasonable and how we can improve the results. With the trained statistical machine translation system an application of the grammar checker would be recommendable, especially in order to correct disagreements between nouns, articles and adjectives. The precision of the grammar checker is very satisfying. With additional linguistic information, for example, syntactical information, we would probably be able to improve the grammar checker and include the correction of other kinds of errors. In addition, the evaluation showed that the improvement of the statistical machine translation system causes a significant decrease of the number of the considered errors. Furthermore, we elaborated various possibilities as to how the statistical machine translation system can be improved. Thus, one might examine, if the improvement of the system yields a significant decrease of the number of the errors. If this should be the case we have to question if the additional use of a grammar checker is still reasonable or if the number of the considered grammatical errors is too low. Additionally, we compare the performance of the trained machine translation system with the state of the art performance in the SUMAT project for the automatic translation of film subtitles from English to Spanish. According to automatic evaluation scores, the system we trained in our project was slightly better than the system of the SUMAT project. This result shows that the use of freely available amateur subtitles for the training of a statistical machine translation system for the translation of professional subtitles is recommendable, even though their quality is not optimal.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Linguistic Annotations in Statistical Machine Translation of Film Subtitles

Statistical Machine Translation (SMT) has been successfully employed to support translation of film subtitles. We explore the integration of Constraint Grammar corpus annotations into a Swedish–Danish subtitle SMT system in the framework of factored SMT. While the usefulness of the annotations is limited with large amounts of parallel data, we show that linguistic annotations can increase the g...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

The Automatic Translation of Film Subtitles. A Machine Translation Success Story?

Every so often one hears the complaint that 50 years of research in Machine Translation (MT) has not resulted in much progress, and that current MT systems are still unsatisfactory. A closer look reveals that web-based general-purpose MT systems are used by thousands of users every day. And, on the other hand, special-purpose MT systems have been in long-standing use and work successfully in pa...

متن کامل

Strategies Used in the Translation of Interlingual Subtitling

This study was an attempt to identify the interlingual strategies employed to translate English subtitles into Persian and to determine their frequency, as well. Contrary to many countries, subtitling is a new field in Iran. The study, a corpus-based, comparative, descriptive, non-judgmental analysis of an English-Persian parallel corpus, comprised English audio scripts of five movies of differ...

متن کامل

Chunk-based Grammar Checker for Detection Translated English Sentences

Machine Translation systems expect target language output to be grammatically correct within the frame of proper grammatical category. In Myanmar-English statistical machine translation system, the target language output (English) can often be ungrammatical. To solve this need, we propose an ongoing chunk-based grammar checker for translated English sentences. Most of the typical grammar checke...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013